An Adaptive Meta-scheduler for Data-Intensive Applications

نویسندگان

Xuanhua Shi

Hai Jin

Weizhong Qiang

Deqing Zou

چکیده

In data-intensive applications, such as high-energy physics, bioinformatics, we encounter applications involving numerous jobs that access and generate large datasets. Effective scheduling such applications is challenging, due to a need to consider for both computational resources and data storage resources. In this paper, we describe an adaptive scheduling model that consider availability of computational, storage and network resources. Based on this model we implement a scheduler used in our campus grid. The results achieved by our scheduler have been analyzed by comparing Greedy algorithm that is widely used in computational grids and some data grids.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bulk Scheduling with DIANA Scheduler

Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necess...

متن کامل

An Online Scheduling Algorithm with Advance Reservation for Large-Scale Data Transfers

Scientific applications and experimental facilities generate massive data sets that need to be transferred to remote collaborating sites for sharing, processing, and long term storage. In order to support increasingly data-intensive science, next generation research networks have been deployed to provide high-speed on-demand data access between collaborating institutions. In this paper, we pres...

متن کامل

Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments

In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may r...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A User Centered Evolutionary Scheduling Framework

The need for supporting CSCW applications with heterogeneous and varying user requirements call for adaptive and reconfigurable schedulers accommodating a mixture of real-time, proportional share, fixed priority and other policies, thus overcoming frustrating processor bottlenecks. In this paper we try to overcome this anomaly by proposing an evolutionary strategy for a Meta Hierarchical Schedu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

An Adaptive Meta-scheduler for Data-Intensive Applications

نویسندگان

چکیده

منابع مشابه

Bulk Scheduling with DIANA Scheduler

An Online Scheduling Algorithm with Advance Reservation for Large-Scale Data Transfers

Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

A User Centered Evolutionary Scheduling Framework

عنوان ژورنال:

اشتراک گذاری